Device for hierarchical estimation of the motion of image sequences

ABSTRACT

The device includes a set of processing macromodules connected in cascade and organized in accordance with hierarchical levels. Each macromodule is structured so as to partition the current image into macroblocks of a determined size corresponding to its hierarchical level so as to transmit a motion vector field to the block which follows it, and includes first circuits for calculating the displaced inter-image differences DFDi and the gradients on the basis of the values of luminance of the pixels of the video image and of the displacement vectors of each image preceding or following the current image; second circuits for performing blockwise the summations of the displaced inter-image differences and third circuits for performing corrections of the displacement vectors of the image.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a device for hierarchical estimation ofthe motion of image sequences.

It applies in particular to the production of image transmission chainswith a view to reducing the bit rate of the information transmitted.

2. Description of the Background

A process and a device for estimation and hierarchical coding of motionin image sequences is known from French Patent Application No. 89 11328filed in the name of THOMSON CONSUMER ELECTRONICS. This process isexecuted in accordance with several levels of processing. It includesaccording to a first level partitioning the current image intomacroblocks of 2^(P+1).2^(p+1) pixels and in determining a first motionvector field, associated with this macroblock splitting, by using aniterative and recursive estimation algorithm initialized with motionvectors estimated for the preceding image. It consists according to asecond level in partitioning each of the macroblocks into quadrants andin determining, for the blocks which result from this, a second motionvector field using the same estimation algorithm but initialized withvectors from the motion field estimated at the preceding level. Then atan i^(th) level, it consists in partitioning each of the blocksconsidered at level i-1 into quadrant and in determining, for the blockswhich result from this, an i^(th) motion field using the same estimationalgorithm initialized with vectors from the motion field estimated atthe preceding level, the minimum size blocks being blocks of2^(l+1).2^(l+1) pixels. Lastly, it includes in determining a finalmotion vector field from the resulting motion vector fields by choosingthe least high level of splitting for which the motion vector associatedwith the corresponding block leads to the minimization of a criterionconveying the differences in luminance between blocks corresponding toone another in the successive images via the estimated displacementvectors.

This process makes it possible to enhance the convergence of therecursive motion estimation algorithm used and to best adapt the motionestimation algorithm to the "quadtree" type coding used subsequently tocode the resulting motion field. However, execution of the temporal loopof the algorithm turns out to consume a great deal of calculation timeand its implementation in an HD.MAC coder requires too large a number ofelectronic components to be easily integrated with this type of coder.

SUMMARY OF THE INVENTION

The object of the invention is to alleviate the aforesaid drawbacks.

For this purpose, the subject of the invention is a device forhierarchical estimation of the motion of image sequences, characterizedin that it includes a set of cascaded processing macromodules andorganized in accordance with hierarchical levels, each macromodule beingstructured so as to partition the current image into macroblocks of adetermined size corresponding to its hierarchical level so as totransmit a motion vector field to the block which follows it, the devicecomprising first circuits for calculating displaced inter-imagedifferences DFDi and gradients based on luminance values of pixels of avideo image and displacement vectors of each image preceding orfollowing a current image; second circuits for performing blockwisesummations of the displaced inter-image differences; and third circuitsfor performing corrections of the displacement vectors of the image.

The main advantage of the invention is to offer a model of hardwarearchitecture which is simpler to produce than that of the prior devicestogether with a sharply reduced number of processing levels and muchsmaller calculation times.

BRIEF DESCRIPTION OF THE DRAWINGS

Other characteristics and advantages of the invention will emerge withthe aid of the description which follows given in connection with theappended figures which represent:

FIG. 1 an embodiment of a device for hierarchical estimation of motionaccording to the invention;

FIG. 2 an embodiment of a processing module represented in FIG. 1.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The architecture represented in FIG. 1 is adapted to those which areknown from the hierarchical recursive blocks HRB of HD.MAC coders. Itincludes four hierarchical levels (128, 64, 32 and 16) defined by thesize of partition of the blocks of the image and four predictors perlevel. This architecture is modular and enables the number of levels tobe increased or decreased depending on the desired compromise betweenthe accuracy of the results and the bulkiness of the hardware. Thedevice represented includes a set of four processing macromodulesreferenced from 1 to 4 respectively. Each macromodule carries out theprocessing of a hierarchical level. They each include three modules, twoprocessing modules 5 and 6 and one decision module 7. Delay modules 8, 9and 10 are attached respectively to the macromodules 2, 3 and 4. Each ofthe processing modules carries out in parallel the correction of twopredictors. The number of predictors per level can vary by using agreater or lesser number of these modules. For an HD.MAC coder forexample which includes four predictors, only two processing modules arenecessary. Each processing module includes in the manner represented inFIG. 2 a set of three memory blocks 11, 12 and 13, blocks 14 and 15 forcalculating the displaced inter-image differences and a calculatingblock 16 for correcting the predictors. As represented in FIG. 2, theimage sequences are stored in the three memories 11, 12 and 13 whichrespectively contain the next image referenced at an instant t+1, thecurrent image referenced at the instant t, and the preceding imagereferenced at the instant t-1 as well as the luminance values I₁ (t+1),I₂ (t+1); I₃ (t+1), I₄ (t+1) of four pixels surrounding the displacedpixel in the next image at the output of the memory 11, the luminancevalue i(t)=i(z,t) of the current point z at the output of the memory 12and the luminance values I₁ (t-1), I₂ (t-1), I₃ (t-1) and I₄ (t-1) ofthe four pixels surrounding the displaced pixel in the preceding imageat the instant t-1. A first pointwise calculation step gives the valuesof displaced inter-frame difference DFD^(i-1) (t+1) and of gradientsGRAD_(x) ^(i-1), GRAD_(y) ^(i-1) (t+1) on the basis of the luminancevalues coming from the memories and of the initial displacement vectors,with respect to the preceding image at the instant t-1 and with respectto the following image at the instant t+1. More specifically the block14 calculates the values

    DFD.sup.i-1 (t+1),Grad.sub.x.sup.1-1 (t+1),Grad.sub.y.sup.i-1 (t+1)

    A=DFD.sup.i-1 (t+1).Sign(Grad.sub.x.sup.i-1 (t+1))

    B=DFD.sup.i-1 (t+1).Sign(Grad.sub.y.sup.i-1 (t+1))

and the block 15 calculates the values

    DFD.sup.i-1 (t-1),Grad.sub.x.sup.i-1 (t+1),Grad.sub.y.sup.i-1 (t-1)

    C=DFD.sup.i-1 (t-1).Sign(Grad.sub.x.sup.i-1 (t-1)

    D=DFD.sup.i-1 (t+1).Sign(Grad.sub.y.sup.i-1 (t-1)

with DFD^(i-1) =I(z,t)-I(z-D^(i-1),t-1).

The displacement vectors D^(i-1) are calculated by the block 16according to the iterative formulae: ##EQU1## in which:

D_(x) ^(i) and D_(y) ^(i) as well as D_(x) ^(i-1) and D_(y) ^(i-1) arerespectively the horizontal and vertical components of the motionvectors D^(i) and D^(i-1) for a block of pixels respectively atiteration i and at the preceding iteration i-1; after having calculatedthe partial sums of the differences C-A and B-D over the set of thecorresponding block of level n.

When the N initial values for each current block (N then generally beingchosen equal to correspond to blocks in the neighbourhood of thiscurrent block but of a different level, that is to say of immediatelygreater size, it is possible to initialize 4 separate motion estimatesfor this block and give, when the iterative formulae of the algorithmhave converged, four independent values of updated motion vectors. Theupdated motion vectors supplement the values of the vectors of precedinglevels, the squares of the intermediate inter-frame differences DFD usedfor the updates are calculated, stored in memory and added up for thewhole of the block so as to choose for each block on completion of the 4calculations, the "best" displacement out of the four new updateddisplacement vectors. That is to say, the displacement for which the sumS^(i) of the squared inter-frame differences for this block is minimalis chosen. This best vector then constitutes one of the vectors of themotion field calculated at level n and denoted DV128 or DV64 or DV32 orDV16 depending on the iteration level.

The block 16 can also be organized in such a way as to carry out an apriori calculation. In this case the block 16 directly applies to the Ninitial values used for each current block a decision criterion by usinga DFD calculation and by choosing as a vector the one which gives theminimum DFD. It carries out a correction on this chosen vector in such away as to obtain the components of the displacement vector D^(i) _(x)and D^(i) _(y) with the iterative values given above. Calculation of thesums is performed over each 8×8 block of the current block, and then asum over the current block is performed according to the relation##EQU2##

(DFD^(i-1) being the DFD calculated with the vectors D^(i-1) as forGrad^(i-1)).

The block 16 next performs the calculation of the sums over the currentblock and the calculation of D^(i) _(x) and D^(i) _(y), namely thecorrection of the predictors D¹⁻¹.

On completion of this motion estimation phase, a different motion vectorfield corresponding to each level, that is to say to each block size, isavailable in order to determine the final motion field adapted to theconstruction of the coding tree such as a "quadtree" type coding tree.

Having determined the four motion fields, the convergence step consistsin determining in a block referenced by 17 in FIG. 1 the final field ofmotion. Determination of the final field of motion takes place byexamining for each block of minimum size, namely, for each block of size16×16 pixels in the embodiment represented, the different motion-vectorsDVI which were established at each level of splitting. Thus, for eachblock of minimum size, we have a vector DV128 giving a DFD 128established when initializing the procedure, namely at level 1 then avector DV64 giving a DFD 64 established at level 2, a vector DV32 givinga DFD 32 established at level 3, and a vector DV16 giving a DFD 16established at level 4. Thus, for these four vectors, the block 17compares their DFD and chooses the vector DVI giving the smallest DFDias vector assigned to this block of minimum size in the final motionvector field. This same selection is made for all the blocks of size16×16 of a macroblock.

The various processing modules can be made with the aid of three typesof ASIC, one calculating the inter-displaced differences DFD and thegradients in the video signal, the second performing the blockwisesummation of the displaced inter-image differences and the thirdperforming a division of high precision for the correction of thepredictor. The attraction of this partition resides in that the ASICs oftype 1 and 3 need only very little memory hereas the ASIC of type 2requires a lot. The problems of production, design, testing addconsumption are then split up better. However, it is conceivable that inthe near future these three types could be merged into one. Thereduction in the bulkiness of this very repetititve part then leads to anot insignificant reduction in the device as a whole.

To avoid the random accessing of the values of the pixels in thepreceding and succeeding images for the DFD and gradient calculationsbeing a very memory-expensive operation, a first solution could consistin using four simple circuits providing the four values of the pixels ofthe grid cell at spot frequency. This makes it possible to have just onechannel per predictor, each channel working in parallel. An improvementcan also consist of using circuits providing several points of a windowpointed at by the predictor, and hence several grid cells in one go. Byprocessing N points in one go (providing N DFDs and N gradients), it isthen possible to use the same circuit for N-1 other predictors. Whichamounts to saying that through this improvement which is no moreexpensive in memory, it is possible to carry out the random accessing ofseveral predictors in series and not in parallel, this significantlyreducing the bulkiness.

The architecture has another advantage for the calculation of the sumsof displaced inter-image differences DFDi depending on the levels i 128,64, 32 and 16 calculated for each block of size 16×16 of a macroblock.Indeed, in the case of a priori calculation the decisions are made bycomparing the sums S(i-1), that is to say the sums of DFDi calculatedover the predictors, whereas for convergence, the decision is made bycomparing the sums S(i), that is to say the sums of DFDi calculated overthe vectors summed over blocks of size 16×16 whatever the hierarchicallevel. In order not to recalculate the sums S(i) from the vectors, thedevice makes it possible to define the predictors of each level i on thebasis of the vectors of level i-1. Thus by performing the sums overblocks of size 16×16 whatever i, the displaced inter-image differencesDFDi over the predictors this amounts to calculating directly the sumsutilized for the convergence of level i-1 with a very small hardwarecost. In an HD.MAC application the sums S(i) 128 are calculated at level64, the sums S(i) 64 are calculated at level 32, the sums S(i) 32 arecalculated at level 16, and the sums S(i-1) 16 are then calculated witha further processing module represented with the reference 18 in FIG. 1,only one channel of which is rigged up. Given that it is the decisionmodule which receives the information originating from the processingmodules it is circumspect to use it both to generate predictors ofsucceeding levels and to provide on convergence the vectors of the i andthe sums S(i-1). The delay modules 8, 9 and 10 essentially consist ofmemories for producing the delays for compensating the video signal soas to deliver it correctly in phase to the various hierarchical levels.These modules can be cascaded to allow delay compensation on a singlehierarchical level. Moreover, a delay module 19 is placed at the head ofthe processing module 1 so as to perform at the head of the system anexpansion consisting in doubling the duration of a video line and hencedoubling the duration of a frame in the case where the motion field iscalculated only every second image as is the case with HD.MAC. Thisexpansion makes it possible to halve the processing frequency of thewhole of the remainder of the device.

We claim:
 1. A device for hierarchical estimation of the motion of image sequences, comprising:a set of cascaded processing macromodules organized in accordance with hierarchical levels, each macromodule of said set of cascaded macromodules being structured so as to partition a current image into blocks of a determined size, a size corresponding to a hierarchical level in order to provide a motion vector field to a block of a lower hierarchical level corresponding to a lower size, first circuits for calculating a displaced inter-image differences DFDi and gradients based on luminance values of pixels of the current image and of displacement vectors of each image preceding and following the current image; second circuits for performing summations, on a block, of the displaced inter-image differences; and third circuits for performing corrections of the displacement vectors of the current image.
 2. The device according to claim 1, wherein said each macromodule comprises at least one processing module and a decision module coupled to the at least one processing module.
 3. The device according to claim 2, wherein each processing module comprises means for correcting at least two predictors.
 4. The device according to any one of claims 1 to 3, wherein each macromodule comprises means for summing the displaced inter-image differences of a hierarchical level and for transmitting, to the macromodule of a corresponding lower hierarchical level, the summed displaced inter-image differences.
 5. The device according to any one of claims 1 to 3, wherein the set of cascaded macromodules comprise means for randomly accessing gridded blocks of predictors in the preceding and succeeding images in parallel with an access to the grid cells alone.
 6. The device according to claim 4, wherein the set of cascaded macromodules comprise means for randomly accessing gridded blocks of predictors in the preceding and succeeding images in series with an access to a more extensive window.
 7. The device according to claim 1, wherein the second circuits comprise a summation circuit for calculating a sum for an i-th hierarchical level at an (i-1)^(th) hierarchical level. 